The robustness of sparse network under limited attack capacity
Wang Xiao-Juan1, Song Mei1, †, Jin Lei1, Wang Zhen2
Electronic Engineering Institute, Beijing University of Posts and Telecommunications, Beijing 100876, China
Electronic Engineering Institute of PLA, Hefei 230000, China

 

† Corresponding author. E-mail: songmbupt@126.com

Project supported by the National Natural Science Foundation of China (Grant No. 61471055).

Abstract

The paper studies the robustness of the network in terms of the network structure. We define a strongly dominated relation between nodes and then we use the relation to merge the network. Based on that, we design a dominated clustering algorithm aiming at finding the critical nodes in the network. Furthermore, this merging process is lossless which means the original structure of the network is kept. In order to realize the visulization of the network, we also apply the lossy consolidation to the network based on detection of the community structures. Simulation results show that compared with six existed centrality algorithms, our algorithm performs better when the attack capacity is limited. The simulations also illustrate our algorithm does better in assortative scale-free networks.

1. Introduction

A large number of complex systems can be represented as networks. The research scope of the complex network includes biology,[1,2] social network,[3,4] communication,[5,6] and so on. The robustness of the network is significative to be focused on. It has been proved that the large-scale real network is robust to the random attack while it shows vulnerability to the targeted attack. Most real networks are sparse networks. Sparse networks[710] are those with small number of short loops and edges. So, in complex network, we focus on the researches of sparse networks.

Generally, the network will be broken after deleting a part of nodes chosen by centrality algorithms. The existing centrality algorithms[11,12] include the degree algorithm,[13] the betweenness algorithm,[14] the closeness algorithm,[15] the k-core algorithm,[16] and so on. In real networks, the number of nodes in the network is large, but the attack capacity is limited, that is to say, we can just attack a small part of nodes in the network. Under this occasion, deleting the nodes chosen by the algorithms above has no big influence on the network, which means the existing centrality algorithms perform not so well when the attack capacity is limited. So in this paper we tend to design an effective centrality algorithm to figure out the critical nodes when the attack capacity is limited.

The process of deleting nodes can be seen as the process of percolation in the network. Latora[17] studied the vulnerability and protection of infrastructure networks in which he put forward a framework to define critical damages, critical improvements and structural vulnerability. Karrer[18] reformulated percolation as a message passing process and gave effective way to calculate the size of the percolating cluster and the average cluster size in sparse network. Agliari[19] analysed bond percolation process on correlated random network and showed different evolution of the network between stochastic and deterministic process. Azimitafreshi[20] studied k-core percolation in multilayer networks and presented an analytical framework which enables us to describe the nature of the transitions. Liu[21] figured out influential nodes via SIR model and used dynamic centrality to measure the importance of the nodes. Tian[22] focused on phase transition during percolation. The explosive percolation is found in finite networks. However, the explosive percolation will not appear after limited nodes deleted. Therefore, the existing centrality algorithms are not effective when the attack capacity is low.

In this paper, we tend to pay attention on the network structure and design a novel algorithm to measure the centrality of nodes. There are vast previous researches on network structure. Kanevsky[23] gave fast algorithm to find out minimum cut vertex set in the network from which we can merge the network iteratively. Rosvall[24] used maps of information flow to detect the community structure in the network. We can use the community structure to merge the network and the nodes which have swallowed many other nodes are considered to be important to the network. Wang[25] detected the community structure using nonnegative matrix factorization which has powerful interpretability and close relationship between clustering methods. Singh[26] found community structure in sparse network using random walks with reluctant backtracking operators. Liu[27] defined three properties of community and proposed a simple and novel framework to detect community by network topology. However, if we merge the network by just using the exiting community detection algorithm, the critical nodes perform poorly during the process of attack.

In this research we define a dominated relation between nodes based on the cut vertex in the network. Then we merge the network using the strongly dominated relation and we call this process as the lossless consolidation. We design a dominated clustering algorithm to figure out the critical nodes by the weights which can be obtained by the merging process. Simulation results show the effectiveness of our algorithm in sparse real networks when the attack capacity is limited. Our algorithm performs better in assortative scale-free network compared with other centrality algorithms.

The rest of this paper is arranged as follows. Section 2 gives the indicators for the network vulnerability and shows the limitation of the existing algorithms. We define the strongly dominated relation and put forward the dominated clustering algorithm in Section 3. In Section 4, the simulation result of deleting nodes is shown. Section 5 concludes the paper.

2. Topology vulnerability
2.1. Indicators for vulnerability

The indicators for measuring the vulnerability of networks are abundant. This paper uses the Largest Connected Component (LCC) to evaluate the communication capability of the network currently, in other words, the robustness of networks. Furthermore, we use the number of components (the isolated nodes are excluded) and the scale of small components to evaluate the broken degree of the network. Then we can write down the definition of vulnerability indicators as follows:

The scale of the LCC Here we use S to represent the number of nodes in the LCC of the network. and represent node i and node j. is the edge between the node i and node j. The LCC is the carrier of the network traffic. Therefore, when the network is under attack, we can use the number of nodes in the LCC to evaluate the overall connected situation of the network. S is also an import indicator to indicate whether the network is failed.

The number of connected components After deleting several nodes in the network, the connected components will appear in the network. Considering the exist of the isolated nodes will impact the study results, we exclude the isolated nodes from the connected components.

The mean scale of small connected components Small connected components are the set of connected components except the LCC. Here we use to represent the number of nodes in small connected components. Then is the number of small connected components.

2.2. The limitation of existing algorithms

Generally, the way to find out the critical nodes is to use node centrality algorithms. Different centrality indicators measure the nodes value from different angles. If we attack the network with different centrality algorithms, it will show different attack performances. Here we give some existing node centrality indicators. First, we introduce the degree centrality which is known as the most common used indicator.[13] Here i represents the node i in the network. Degree centrality indicates the value of the node by using the number of edges connected to it. Then the betweenness centrality[28] which calculates the number of the shortest paths which pass the node is shown as where is the number of the shortest paths from node s to node t passing node i. is the total number of the shortest paths from node s to node t. n is the total number of nodes. After that we introduce the closeness centrality which measures the distance of the node to other nodes in the network. The value of the closeness centrality[11] shows the broadcasting time length of information in the network. where represents the distance from node s to node t. Eigenvector centrality[29] uses the eigenvectors of the adjacent matrix to indicate the importance of nodes where A is the adjacent matrix. If there is an edge between node i and node j, we set . Otherwise, . Besides are eigenvalues of A, and the corresponding eigenvectors are . We also use Pagerank centrality[30] to indicate the node’s value which is defined as Here k(j) is the number of adjacent nodes of node j. q is the transition probability. What is more, the k-core centrality is an important indicator, which deletes the nodes of degree k and sets the centrality of the deleted nodes as k (k = 1,2,3, …). In this paper, we use to represent k-core centrality[12] of the node i.

However, the existing centrality algorithms do not always perform well, especially under the limited attack capacity. Figure 1 shows the 20 most important nodes of different centrality algorithms in the Router network. From Fig. 1 we can find that different centrality algorithms select different important nodes, but there are also some nodes chosen by two or more algorithms. It is shown that the centrality algorithms measure the node importance from various aspects. However, the simulations will show the shortness of the existing centrality algorithms in the following part. In the percolation process of the network, we delete the nodes according to the order of the centrality. In simulations, we observed that the phase transition will not appear until at least 5 percents of the nodes are deleted. While deleting 5 percents of the nodes in the real networks may exceed the attack capacity. Furthermore, the nodes centrality just measure the importance of the single node, ignoring the linkage effect between nodes. Thus we need to design an algorithm which considers the connection between nodes and indicates the importance of the nodes as well.

Fig. 1. (color online) The visualization of nodes selected by different algorithms.
2.3. Network dataset

In order to show the different performances of the centrality algorithms, we do simulations on the real networks and the Barabasi Albert network with different assortativities. Table 1 and 2 give networks[3136] properties in our simulations.

Table 1.

The real networks.

.
Table 2.

The artificial networks.

.

Here n is the number of nodes in the network. m is the number of edges, is the mean degree, and is assortativity coefficient. C represents clustering coefficient. is the largest node degree and is the smallest node degree in the network. D measures the density of the network which is defined as We can find that for sparse network, D is very small. The density of networks relative. If the network is fully connected (D = 1), our algorithm is disabled. However, at this time, it is useless to analyze the robustness of centrality. For most networks, D is small. These networks are sparse network.

We also compare performance in artificial network. BA networks are scale-free networks. Initially we have nodes fully connected. At each time step, we introduce a node with m edges, and the m edges are connected to other nodes with probability . After that we can use structural algorithm to change the assortativity of the network. Assortativity[37] coefficient is defined as where is the sum of degree and is the variance of degree. For structural algorithm, we randomly choose two links , at each time. If , we change the connections to , . Obviously in this algorithm, the degree distribution does not change. By repeating this process, we can get different networks with different .

3. Network mergence based on dominated relation
3.1. The dominated relation

In this paper, we tend to analyze the network using a new relation between nodes which is called as the dominated relation. For an undirected network , we use V to represent the vertex set and E to represent the edge set. To fully understand the dominated relation between nodes, we give definitions below.

In the 1-cut network, node a is one of the nodes whose degree is 1. Node b is connected with a. From the theories above, we can define node a is dominated by node b, in other words, there is a dominated relation between nodes a and b. We also define node b as the minimum cut of the network. Then we can merge the dominated nodes whose degree is one. This process can be repeated several times until there is no node whose degree is 1.

Then we focus on the situation when the network is 2-cut. In this situation we can say the nodes which connect only to the minimum cut are dominated by the minimum cut of the network. However, this relation is not reliable for deciding the critical vertex. In this paper we define a strongly dominated relation between nodes. Suppose one minimum cut of the network is (a,b), the dominated nodes of (a,b) are . We use , , to represent the profit of just deleting one node a, b or c. (The profit of nodes represents the degree of damage to the network) is the profit from deleting nodes a and b. If there is Then we call (a,b) and have strongly dominated relation between them. Furthermore, we extend this strongly dominated relation to k-cut situation. Then we have Here is the minimum cut set and are dominated nodes.

3.2. The dominated clustering algorithm

Using the strongly dominated relation between nodes, we can design dominated clustering algorithm (DCA) to find out the critical vertex. To find the important nodes is to find the nodes with high profit. In our algorithm, we merge nodes step by step, and use two lists to save the node information (value and size). Value is the profit of the node and size keeps the index of the other nodes merged by the node.

Generally, we suppose there are one-degree nodes in the network. So from the Lemma 1 we can gain the network is 1-cut. At the first step, we initially set the value of every node is 1, and the size of it just keep the index of itself. After that, we delete the dominated nodes and their values are added to the corresponding minimum cut nodes’ value. The value of the deleted nodes is set to be 0. According to Lemma 2, this process will be repeated until the network has no one-degree nodes.

Then we tend to deal with 2-cut situation. (a,b) are minimum cut and are dominated nodes. From formula (11), we obtain Here we use v to represent value and s to represent the number of nodes in size. If the formula (11) is satisfied, we delete b and and set the value to zero. Then we connect node a to the nodes that node b used to be connected with. The value of a is set as the profit of (a,b). The profit is

During the process of dealing with 2-cut situation, the network may have new one degree nodes. We just deal with the nodes as the method of 1-cut. When the network is k-cut, we can extend our algorithm to k-cut. is minimum cut set and are dominated nodes.

At the merging step, we merge the k minimum cut nodes as 2-cut. The value is set as

Figure 2 shows several special cases for 1-cut, 2-cut, and 3-cut. We also show the merging method of each case.

Fig. 2. (color online) The node centrality of different algorithms.

Overall our algorithm is arranged as:

From formula (15), we can find that when k is larger the strongly dominated relation is harder to be satisfied. Generally, for real networks, when we deal with k-cut situation, k = 4 is enough. In this process, the information of the network is lossless, so we call this process as the lossless consolidation.

In Fig. 3, here is a small network with 52 nodes and 87 edges. In Step 1 of our algorithm, we find all the nodes with one degree, and then we merge the one-degree nodes into the connected nodes. At this moment, there exists no one-degree vertex in the network. In Step 2, firstly we find all the two-degree nodes. Then we focus on the nodes connected to these two-degree nodes and merge them into one node. For instance, we merge node A and the other nodes in the same circle into node A. Now we notice the existence of one-degree nodes in the network, so we repeat Step 1. Actually, Step 3 is the same as Step 1. At this point, we turn our attention to the cut point in the network. In Step 4, we find the one cut points, for instance, node C, and merge the other connected nodes into the one cut points, which is actually merge the nodes in the red circle into node C. In Step 5, two cut points are found and the nodes they connect are merged. For example, the nodes in the red circle are merged into node D. In this process, we should judge whether the nodes to be merged meet the combination condition before every merging.

Fig. 3. (color online) The schematic diagram of mergence process.

The characteristics of our algorithm is that the algorithm focus on local structure of the network through cut nodes. We understand the network structures more clearly. The existing centralities can be divided into three classes which are neighborhood-based centralities, path-based centralities, and iterative refinement centralities. Neighborhood-based centralities are degree, k-core, H-index, and so on. These centralities focus on neighbor nodes of the central nodes. For path-based centralities, there are betweenness, closeness, and Subgraph centrality. These centralities concentrate on the shortest paths between nodes. For iterative refinement centralities, there are eigenvector, pagerank, HITs, and so on. These centralities are based on that the influence of a node is not only determined by the number of neighbor nodes, but also determined by the influence of its neighbor nodes. So in these centralities, every node gets support from its neighbor nodes. However, there is no centrality which concentrates on local dominated relations between nodes. In our algorithm, we calculate the node centrality using node mergence. This is also a characteristic of our algorithm.

After using the lossless consolidation, we can also do the lossy consolidation. This means the network will be further merged after applying our algorithm. The significance of the lossy consolidation is the visualization of the network. The real network has too many nodes to show the details of the network. Thus we want to merge the nodes first, and then the struct of the network will be easy to be shown. In lossy consolidation, firstly, we do community detection in the network after mergence. In every community, there are nodes which are connected with other nodes in other community. We call these nodes as gate nodes. In a community C, are nodes which are just connected with nodes in C. are gate nodes of the network. So the profit of is In every community, we keep the gate nodes. For nodes , if there is , we keep the node . Otherwise, we delete it. Obviously, this process dose not satisfy the strongly dominated relation and the information of the network is missing, so we call this process as the lossy consolidation. The network after this process is easy for visualization.

3.3. The merging performance of networks

To show the efficiency of our algorithm, we do simulation in the dataset. Figure 4(a) is the visualization of router network which has 23441 nodes. Figure 4(b) is the router network after mergence which has 1164 nodes. We can use the network with 1164 nodes and merge data to represent the whole network. It is easy to observe that figure 4(a) is too complicated to show the structure of the network, while figure 4(b) shows the details of the network.

Fig. 4. (color online) Comparison of theoretical value and simulation value.

Table 3 shows the merging performance of different networks, and we can find that the dominated clustering algorithm can effectively merge the network. From lossless consolidation we can obtain strong dominated relation in the network. Then we use the node weight of the network after lossless consolidation to find out critical nodes.

Table 3.

The merging performance of different networks.

.
4. Simulation
4.1. The correlation analysis on algorithms

To verify the effectiveness of the algorithm, we utilize Spyder on the python platform to achieve the DCA algorithm we proposed. We compared the simulation results among different algorithms in different networks. In the simulation we used the Package networkx. We can use nx.degree(), nx.betweenness_centrality(), nx.pagerank(), nx.closeness_centrality(), nx.eigenvector_centrality(), and nx.k_core() to calculate six centralities. In order to guarantee the accuracy of the simulation results, we expand the scale of the networks and repeat the experiments to reduce the error. The simulation results are the average of the experimental data.

In this part, we tend to analyze the correlation between the dominated clustering algorithm and existing algorithms. Table 4 shows the top ten nodes obtained by our algorithm and their value and rank in other centrality algorithms. (In Table 4, the first item is value and the second item is rank). We can find that the nodes chosen by our algorithm rank different in other algorithms, which shows our algorithm measure the importance of the nodes in a new angle. From simulations later, we can prove that the important nodes in our algorithm are conducive to network attack under the limited capacity.

Table 4.

The artificial networks.

.
4.2. Comparision on attack performance

Figure 5 shows comparison between dominated clustering algorithm and other centrality algorithms. Figure 5(a) measures the change of S during the process of deleting nodes. After deleting 0.18 percent of the nodes, dominated clustering algorithm performs best. S declines steadily. Figure 5(b) shows the change of C. We can find that dominated clustering algorithm generates more clusters than other algorithms. It means that the dominated clustering algorithm damages the network more thoroughly. Figure 5(b) is the change of M. The dominated clustering algorithm produces more clusters. The S of the dominated clustering algorithm is stable at 30 or so. We also use community detection algorithm (Greedy algorithm) to cluster nodes. Then we use the value of the nodes to show the importance of the nodes. From Fig. 5 we find that if we just use community detection algorithm, the performance of the attack is poor.

Fig. 5. (color online) The comparison of algorithms in router network.

If we just attack the network using single indicator, this approach is too naive. There have been many existing researches describing this process as Information Maximum Problem (IMP). Given network G(V, E), the target is to find a set S to maximize function f(S). In this paper, f(S) represents the degree of damage to the network which is related with LCC. Here we use a heuristic algorithm.[38] We obtain the remove nodes by adaptive recalculation, that is, to choose the node with the largest centrality at first, and then recalculate the centrality after removing the chosen node. Then we choose degree and betweenness to do this because of better performance. Greedy algorithm can also be used for choosing nodes. We choose top n nodes in every centralities as a strategy set. In every step of removing nodes, we choose the node which maximizes f(S).

In the Fig. 5(d), we find if we remove the nodes with recalculating betweenness, the attack performance can be better. This process costs so much time because of the calculate complexity of betweenness. However, our algorithm still performs best when the attack capacity is limited.

In this section, we pay more attention on the performances of the algorithms under real networks. Tables 57 show the LCC of the network after deleting 0.4%, 0.7%, 1% nodes respectively. Each table gives the LCC of the seven real networks after deleting certain percent nodes based on six centrality algorithms and DCA we proposed. The lighted number is the minimum of the row, in other words, the algorithm the lighted number corresponding to has the best performance under the certain situation. From three tables, it can be derived that DCA we proposed is more effective when the attacked nodes number is small, alternatively, the attack to the network has a limited capacity. When the attacked nodes number is allowed to be larger, the advantage of our algorithm becomes not so obvious. We also noticed that DCA has a better performance when the network is sparser.

Table 5.

The performance in the real networks (q = 0.4%)

.
Table 6.

The performance in the real networks (q = 0.7%).

.
Table 7.

The performance in the real networks (q = 1%).

.

To show more detailed information about our algorithm, we display the results of deleting nodes from 0 to 1 in router networks in Fig. 6. We can find that when the attack capacity is not limited, the attack performance is similar for betweenness, degree, pagerank, and our algorithm. We can also find there is percolation phase in the process of deleting nodes. Furthermore, our algorithm focuses on the limited attack capacity. When the proportion of deleting nodes is below 1%, our algorithm is obviously better than other networks. In reality, it is impossible to have infinite attack capacity. For large networks, 1% of nodes may already have hundreds of nodes. So our algorithm is more practical.

Fig. 6. (color online) The attack process of router network.

After that, we study the comparison between the attack performance of the dominated clustering algorithm (DCA) we proposed and that of other centrality algorithms. Different from the simulations before, this section focuses on the performance in the BA networks with different assortativity coefficients. To simplify the simulation and make the figures clearer, here, we choose the degree algorithm (DA) to represent the centrality algorithms because of its popularity and good performance.

The simulation results are shown in Fig. 7. The networks in this section are all BA networks with 1000 nodes. The networks in Figs. 7(a) and 7(b) have 4000 edges while the networks in Figs. 7(c) and 7(d) have 2000 edges. Figures 7(a) and 7(c) give the comparison under the networks more disassortative, while for Figs. 7(b) and 7(d), the networks are more assortative. Generally, figure 7 shows DCA performs better than DA when a few nodes are deleted. With the number of the deleted nodes increases, the performances of the two algorithms meet an intersection. Moreover, when the network is more assortative, the time they meet will be later, that is to say, the intersection will move right. That results further illustrate that the DCA we proposed is more useful for the network attack under the limited capacity and performance is better when the networks are more assortative.

Fig. 7. (color online) The comparison of different performance in BA network. (a) Disassortative BA networks with 4000 edges, (b) assortative BA networks with 4000 edges, (c) disassortative BA networks with 2000 edges, (d) assortative BA networks with 2000 edges.
5. Conclusion

In this paper, we focus on the dominated relation between nodes in the network. Based on minimum cut set of the network, we define a strongly dominated relation between nodes. After that, we can merge the network based on strongly dominated relation. Besides, the merging process is lossless. Because this process just merge the nodes which are dominated nodes. Then we design the dominated clustering algorithm using effective data structure to realize this process. With DCA we can find the critical nodes in the network. We also design an algorithm of lossy consolidation which is helpful in visulization. In simulation we compare the performance of our algorithm with other centrality algorithms. Simulation results show that when the attack capacity is limited, DCA performs best in the real networks. Furthermore, DCA is effective when the network is sparse. From simulation in scale-free network, we can obtain that DCA works better in assortative scale-free network.

Reference
[1] Cai Q Liu J 2016 Sci. Rep. 6 35904
[2] Baldassano S N Bassett D S 2016 Sci. Rep. 6 26087
[3] Zhang Y Zhou S Guan J et al. 2011 Phys. Rev. E 87 1079
[4] Barabasi A.L Jeong H Nda Z et al. 2015 Veterinary Surgery 6 66
[5] Gao J Liu Y Y D’Souza R M et al. 2014 Nat. Commu. 5 5415
[6] Ito S Sagawa T 2013 Phys. Rev. Lett. 111 81
[7] Mitrović M Tadić B 2009 Phys. Rev. 80 026123
[8] Karrer B Newman M E Zdeborová L 2014 Phys. Rev. Lett. 113 208702
[9] Decelle A Krzakala F Moore C et al. 2011 Phys. Rev. Lett. 107 065701
[10] Luccioli S Olmi S Politi A et al. 2012 Phys. Rev. Lett. 109 138103
[11] Kitsak M Gallos L K Havlin S et al. 2010 Nat. Phys. 6 888
[12] Dorogovtsev S N Goltsev A V Mendes J F 2006 Phys. Rev. Lett. 96 185
[13] Elbirt B 2007 The nature of networks: A structural census of degree centrality across multiple network sizes and edge densities Ph. D. Dissertation Buffalo State University of New York
[14] Freeman L C 1977 Sociometry 40 35
[15] Freeman L C 1978 Social Networks 1 215
[16] Batagelj V Zaversnik M 2003 Comput. Sci. 1 34
[17] Latora V Marchiori M 2005 Phys. Rev. 71 015103
[18] Karrer B Newman M E Zdeborov L 2014 Phys. Rev. Lett. 113 208702
[19] Agliari E Cioli C Guadagnini E 2011 Phys. Rev. E 84 247
[20] Azimitafreshi N Gmezgardes J Dorogovtsev S N 2014 Phys. Rev. 90 032816
[21] Liu J G Lin J H Guo Q et al. 2015 Sci. Rep. 6 032812
[22] Tian L Shi D N 2012 Phys. Lett. 376 286
[23] Kanevsky A 1993 Networks 23 533
[24] Rosvall M Bergstrom C T 2007 Proceedings of the National Academy of Sciences USA 1118
[25] Wang F Li T Wang X et al. 2011 Data Mining and Knowledge Discovery 22 493
[26] Singh A Humphries M D 2015 Sci. Rep. 5 8828
[27] Liu W Pellegrini M Wang X 2014 Sci. Rep. 4 5739
[28] Brandes U 2008 Social Networks 30 136
[29] Bonacich P 1987 American Journal of Sociology 92 1170
[30] Langville A N Meyer C D 2005 Siam Review 47 135
[31] Bu D Zhao Y Cai L et al. 2003 Nucleic Acids Research 31 2443
[32] http://vlado.fmf.uni-lj.si/pub/networks/data/
[33] Watts D J Strogatz S H 1998 Nature 393 440
[34] Adamic L Glance N Adamic L et al. 2005 IL Nuovo Cimento A 62 127
[35] Roget P M Dutch R A 1982 Roget's Thesaurus of English Words and Phrases Longman 90
[36] Knuth D E 1993 The Stanford Graph Base: a platform for combinatorial computing Reading Addison-Wesley 41 43
[37] Newman M E 2002 Phys. Rev. Lett. 89 111
[38] Holme P Kim B J Yoon C N Han S K 2002 Phys. Rev. 65 56109
[39] Kempe D Kleinberg J Tardos E 2003 Proceedings of the 9th ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining August, 24–27, 2003 Washington, DC, USA 137 146